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Abstract 

We consider the problem of constructing confidence intervals for nonparametric func- 
tional data analysis using empirical likelihood. In this doubly infinite-dimensional 
context, we demonstrate the Wilks's phenomenon and propose a bias-corrected con- 
struction that requires neither undersmoothing nor direct bias estimation. We also 
extend our results to partially linear regression involving functional data. Our nu- 
merical results demonstrated the improved performance of empirical likelihood over 
approximation based on asymptotic normality. 

Keywords: Empirical likelihood; Confidence interval; Functional data analysis; Non- 
parametric regression; Wilks's theorem. 

1 Introduction 

Recently there has been an explosion of interests in the study of functional data, 
where the independent variables in the statisti cal problem ar e curves, or more ab- 



strac t ly, elements belonging t o a metric space (iRamsayl . Il982l ; iRamsay and Dalzel] 



199ll ; iFerraty and Vieul . 12004 ). As a natural extension of multivariate data analysis, 
functional data analysis provides valuable insights into these problems. Compared 
with the discrete multivariate analysis, functional analysis takes into account the 
smoothness of the high- dimensional covariates, and often suggests new approaches to 
the problems that have not been discovered before. Even for non-functional data, the 
functional approach can often offer new perspectives on the old problem. 
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This statistical area was first developed from extending the parametric linear 
models to the functional context, followed by the more recent literature on nonpara- 
metric techniques, mostly focusing on the kernel regression method. For an intro- 
ductory exposition on t his fi eld, we will refer the r e ader to the two monographs, 



Ramsay and Silverman! (120051 ) and iFerraty and Vieul (120061 ) , for the parametric and 



nonparametric approach respectively. More recently, there has also been some studies 
using a l ternative nonparametric tech niques such as reproducing kernel Hilbert spaces 
(jPredal . 120071 ; iHernandez et all 120071). and dealing with the c ase where the dependent 



20021 : iLiad . 12007V ). 



variables are curves as well (jCuevas et al. 

We are concerned here with the nonparam etric regression problem using the 
Nadaraya-Watson estimator. As pointed out in IFerraty and Vieul (120021 ). although 
the extension of the Nadaraya-Watson estimator is relatively straightforward in com- 
putational aspects, mathematical challenges arise from the fact that we are now 
dealing with a doubly infinite-dimensional situation: both the regression function 
and the covariate belong to an infinite-dimensional space. By now there have ap- 
peared many studies extending various kernel regression results to th e functional 
case, for i.i.d as well as weakly dependent data. IFerraty and Vieul ( 120041 ) established 
the convergence rates for the kerne l estimator, while as ympt otic no r mality was ex- 
tended to the functional context in IFerraty et al.l (120071 ) and iMasryl (120051 ) for i.i.d. 
and strongly mixin g sequence respectively. Nearest neighbor regression appeared in 



Burba et al.l (120091 ). Similar techniques have also been extended to semiparametric 
problems such as partially linear mo dels where the nonparametric p art is estimated 
by the Nadaraya-Watson estimator ( Aneiros- Perez and Vieul . 120061 ). These studies 
confirmed the applicability of traditional nonparametric method in functional con- 
texts. All of these results also demonstrated the importance of appropriately dealing 
with the data sparsity problem caused by the infinite dimensionality of the dependent 
variables, and various semi-metrics have been proposed to alleviate this problem. 

As is prevalent in the statistical community, one can argue that the assessment 
of uncertainty of an obtained estimator is an important step in all statistical anal- 
ysis. This aspect is so far ignored in the literature for nonparametric functional 
d ata analysis. The asy mptotic normality of the Nadaraya-Watson estimator shown 



m iFerraty et al.l (120071 ) with explicit expressions for the bias and variance terms pro- 
vides us with a mechanism for constructing asymptotically valid point- wise confidence 
intervals. But besides the fact that the expressions for bias and variance involve un- 
known parameters and thus need to be estimated from the data which is itself a 
difficult problem, our simulations also show the intervals constructed in this way 
have poor coverage rates for finite sample sizes. 

In thi s artic l e, we propose to adapt the empirical likelihood method, first intro- 
duced by lOwenl (119881 ) . to construct point- wise confidence intervals for the regression 
function. A major advantage of empirical likelihood is that it involves no predeter- 
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mined assumptions on the shape of the confidence interval, while the interval con- 
structed by normal approx i matio n is always symmetric around the point estimator. It 
is proved in lDiciccio et al.l (119911 ) that empirical likelihood is Bartlett-correctable and 
thus it has an advantage over another popular nonparametric method for establish- 
ing confidence interva l s, the boot strap. The genera l prop erty of empirical likelihood 
was studied by lOwenl (119901 ) , and iQin and Lawless! (119941 ) gave a much more general 
theory on empirical likelihoo d properties with estim ating equations. In the context 
of kernel density estim ation, Hall and Owen! (119931 ) studied the empirical likelihood 
confidence bands, while IChenl (119961 ) provided finer analysis for point-wise confidence 
intervals and his simulation clearly demonstrated the improved coverage accuracy of 
empirical likelihood over percentile-t bootstrap. 

The purpose of our study is to establish the Wilks's phenomenon for empirical like- 
lihood in nonparametric functional regression with strongly mixing data. Although 
empirical likelihood has been used in many different problems, the application of this 
method in functional data analysis is new and development its asymptotic properties 
is more involved due to the double infinite dimensionality problem mentioned above. 
It is well-known that in kernel regression, the constructed interval has a non-ignorable 
bias when the optimal bandwidth for function estimation is used. Similar to the in- 
tervals constructed by bootstrap, there are in general two approaches to address this 
problem. One is to correct the bias by undersmoothing, and the other is to use the 
explicit bias formula to shift the constructed intervals, with unknown quantities esti- 
mated by the data and plugged into the expression. We find that in functional data 
regression, the first approach using smaller bandwidth aggravates the data sparsity 
problem and in our simulations using a small bandwidth causes the estimation prob- 
lem to be very unstable since there are very few covariates found within the smaller 
neighborhood of the testing covariates. For the second approach, as mentioned above, 
the bias is difficult to estimate as can be seen from the expression for bias which is 
discussed in more detail later. Thus we propose an implicit method that first uses the 
optimal bandwidth to obtain an estimate of the regression function and then uses the 
estimate to correct for bias in the estimating equation. Our simulations show that 
the bias-corrected intervals have improved coverage rates. 

The organization of the paper is as follows. Section 2 presents the model for empir- 
ical likelihood interval construction. We establish the Wilks's theorem for empirical 
likelihood with dependent data. This theoretical generality is required when we work 
with time series data later. Then the bias corrected interval is constructed and we 
also briefly discuss the extension to partially linear models which greatly expands the 
scope of the method. In Section 3, we use both simulated and real data to show the 
improved accuracy of confidence intervals constructed by empirical likelihood over 
those by normal approximation. Section 4 is devoted to a discussion of the results. 
The technical proofs are collected in the Appendix. 
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2 Empirical likelihood inference for nonparamet- 
ric functional analysis 

2.1 Nonparametric functional model 

Let {Yi,Xi}" =1 be a stationary and ergodic process marginal ly distributed as (Y,X). 
The functional nonparametric regression model, introduced in lFerraty and Vieul (120041 ) 
is denned as 

y 4 = r(X,) + e,. 

We assume that 6j is a random variable with E(ei\Xi) = and Var(ei\Xi) = a 2 (Xi). 
The dependent variable Yj is real-valued, and the covariate X, is assumed to belong 
to some semi-metric vectorial space H equipped with a semi-metric d(., .). 

Estimation of the regression func t ion r is a crucial issue in nonparametric func- 
tional model and iFerraty and Vieul (120041 ) proposed the adaptation of Nadaraya- 
Watson estimator to the functional context for a fixed xq. 



r[x 



EtiKjdjX^xo)/^^ 
EtiK(d(X u x )/h) 



where K is the kernel and h is the bandwidth, which satisfies h — > as n —>■ oo. 

The asymptotic properties of f crucially depends on the following so-called small 
ball probabilities on the covariates, since these probabilities are directly related to 
the data sparsity problem that often plagues the infinite-dimensional model: 

<Kh) = p(XeB( XQ ,h)), 

where B(xo, h) = {x G H, d(x, xo) < h} is a neighborhood of Xq. Note that since we 
always consider a fixed xq for estimation throughout, we will omit the dependence 
of (j){h) on xo in our notation. For weakly dependent data, the properties of the 
estimator also depend on the pairwise joint distribution of the covariates, and we 
define 

if)(h) = sup P(Xi G B(x ,h),Xj G B(x ,h)). 

Note for independent data sequence, we obviously have ip(h) = 4>(h) 2 . 

For weakly dependent sequence data as dealt with in our current paper, rates of al- 
most co mplete convergence (sligh t ly str onger than almos t sure convergence) were ob- 
tained in Ferraty and Vieu (j2004 . 20061 ). Later works by Masry ( 2005 ); Ferraty et al. 

r(x ) 



(120071 ) showed that yn(f)(h)(r(x ) — r(x ) — b(x )) is asymptotically normal under 
suitable conditions. Using smaller h, the bias term b(xo) can disappear at the cost of 
enlarged asymptotic variance. 
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2.2 Empirical likelihood inferences 

There is an estimating equation representation for the Nadaraya-Watson estimator. 
The estimator is actually the solution to the equation 

f:^(^-/i)=o, 

i=i 

where we use = K(d(X i ,xo)/h) to simplify the notation. The corresponding 
population version is 

EK{Y — fj,) = 0, 

where K = K(d(X, x )/h). Obviously the zero of the above equation is /x = 
EKY/EK instead of r(x ). Note that /i implicitly depends on h which in turn 
depends on n. 

Now we can define a likelihood function for /i based on the empirical likelihood 
principle as 

{n n n \ 

Y[Pi\Pi > 0,J2Pi = l,J2Pi K i( Y i~ = f ' 
i=l i=l i=l J 

where we set L n (/j) = if there is no {p«}™ =1 satisfying the above constraints. It 
is easily seen that the maximum empirical likelihood estimator coincides with the 
Nadaraya-Watson estimator r(xo) with L n (f(xo)) = n~ n , achieved when pi = 1/n. 
The corresponding nonparametric likelihood ratio is given by 

{n n n ~\ 

Yl(npi)\pi > 0,Y,Pi = 1 ,J2Pi K i( Y i ~ A*) = f • 
i=l i=l i=l J 

Using the duality approach, the log likelihood ratio (multiplied by a constant —2) 
becomes 

n 

IrM = -2\og LRM = 2£(1 + XK^ - 



i=i 



where the Lagrange multiplier A solves 



^l + XKiiXi-fi) ■ 
Now we give the first result of the paper. 

Theorem 1 Suppose that conditions (c0)-(c8) in the Appendix hold, then 

(a) lr n {no) — > xl i n distribution 

(b) If in addition nh 2a (fi(h) — > ; then lr n (r(x )) — > xl ^ n distribution, where a is 
defined in condition (c2) in the Appendix. 
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Thus, according to part (a) of the theorem, we can use the empirical likelihood 
ratio to construct a 1 — a confidence interval for /i as 

{/i : lr n (ji) < Xi (<*)}» 

where Xii a ) is the 1 — a quantile of the xl distribution. The bandwidth h can be 
chosen as the optimal bandwidth for estimating r(xo)- In particular, cross-validation 
can be used to choose a data- dependent bandwidth. In general, as explained above, 
the constructed confidence interval is valid only for /x instead of v(xq), while the 
latter is more commonly the focus of inferences. Part (b) of the theorem shows that 
the problem can be solved with undersmoothing. But in practice, how to choose the 
bandwidth is at issue and the asymptotic theory does not provide a solution. Besides, 
using a smaller bandwidth makes more severe the data sparsity problem which makes 
estimation of r(x ) unstable. More discussions and our solution is presented in the 
next subsection. 

The numerical problem is in general easy to tackle since it only involves one- 
dimensional root-finding. We use Brent's algorithm in our implementation which is 
faster than the simple bisection algorithm. 



2.3 Bias-corrected empirical likelihood 

To correct the bias when constructing the confidence interval for t(xq), we can 
shift the interval according to the es timated magn itude of the bias. In the proof 
of asymptotic normality of r(x ) in iMasryl (120051 ). the bias is Er(xo) — r(x) = 
[EKY — r (xn)EK]/EK ■ (1 + o(l)), which is in general difficult to estimate. For 



i.i.d. data, iFerraty et al.l (120071 ) made the following definition 



f(s) = E(r(X)-r(x )\d{X,xo)=s), 

and imposed the assumption that the derivative /'(0) exists. Under this assumption, 
the bias is shown to be /'(0)/iM o /Mi • (1 + o(l)), with M = K{1) - £{sK{s))'T{s)ds 
and Mi = K(l) — Jq 1 K'(s)r(s)ds, where the definition of r can be found in the Ap- 
pendix. Because of the difficulty in estimating r, to compute explicitly M and Mi 
the author used the uniform kernel K(s) = J[o,i](s). In general, implicit in their calcu- 
lation is the fact that M = \im.h-+aE(d(X,xo)K)/h(f>(h) and Mi = lim^o EK/<f>(h) 
and estimators for M and Mi can be easily obtained using sample averages based on 
this characterization. Nevertheless, in either case, estimating /'(0) looks much more 
intimidating. 

In this work, we propose to use the bias-corrected estimating equation 



J2K i (Y i -r(X i ) + 



r{x ) 



/i) =0 
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and the correponding empirical likelihood ratio 

{n n n ~\ 

U( n Pi)\Pi > 0,E^ = l,£*M^( y i - r(Xi) + f(xo) -fi) = 0\. 
i=l i=l i=l ) 

The idea is that Ki(Y i —r(X i )+f(x )—fi) will be close to KiiY— r(Xj)+r(x )— fi) = 
Ki(ei + r(x ) — (j,), which is obviously an unbiased estimating equation for r(x ). For- 
mally, we have the following result that shows the bias-corrected empirical likelihood 
ratio leads to valid confidence interval without the need for undersmoothing. 

Theorem 2 Suppose that conditions (c0)-(c9) in the Appendix hold, then Zr*(r(xo)) = 
— 2 log Li?* (r(xo)) — ► xi i n distribution. 

Finally, we mention that the same bias correction can be used in confidence interval 
constructed with normal approximation theory, with the interval centered at the 
solution to the bias-corrected estimating equation J2i Ki{Yi — rpQ) + f(x ) — /i) = 
instead of r(x ). 



2.4 Inferences for partially linear models 



Cons ider the semi-functional partially linear model (lEngle et al.l . ll986l ; lAneiros- Perez and Vieul . 
2006h 



Hi ■ 



i Zip) 



Yi = Zf P + r{Xi) + e h i = l,...,n, 
is the p-dimensional covariate and (3 = (fli, . . . , f3 p ) is the 



where Z t = (Z, 

coefficient for the linear part. We will use (3q to denote the true parameter in the 
following. As before, for the nonparametric part, the covariate X is of functional 
nature. The estimation of this model is based on the following profiling approach. 
For a given (3, the nonparametric part is estimated by 



r(x,/3) 



where Ki = K{d{X il x)/h). Using this definition, the linear coefficient /3q is estimated 
by the profile least square 

$ = argmin£(F i - Zj (3 - f(X h (3)) 2 . 



After (3 is obt ained, we estimate the n o npara metric part by f(x) = f(x, (3). 

The paper lAneiros- Perez and Vieul (120061 ) studied the i.i.d. case and showed that 
^Jn(f3—f3) has an asymptotically normal distribution under mild assumptions. Instead 
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of repeating their assumptions and extending their arguments for strongly mixing 
sequences, we directly impose the assumption that f3\\ = O p {n~ 1 / 2 ) for simplicity. 
Using a plug-in approach, the empirical likelihood ratio can be defined as 

{n n n \ 

Uinpi) | Pi > 0, = 1, J2 PiKi{Yi - - /i) = , 
i=i i=i i=i J 

and the bias-corrected version is 

{n n n 

H(npi)\pi > 0,£> = l^piKipi - 2f[p - r(Xi) + f(x ) - = 
i=i i=\ i=i 



Theorem 3 Suppose that conditions (c0)-(c9) in the Appendix hold, and \ \f3 — (3q\ 
Op(n~ 1//2 ), then lr n (r(xo)) = —2\og LR n (r(xo)) — ► xl ^ n distribution. 



(Shi and Lau. 


2000; 


Lu. 


2009: 



Chen and Cuil . 120081 ) focused on constructing confi- 



dence regions for the linear coefficient (3. For our functional model, similar confidence 
regions can be developed for linear coefficients without much difficulty, but we choose 
to focus on the nonparametric part to better match the main theme of the paper. 



3 Numerical results 
3.1 Simulation data 

We first use a simulated example to show the performance of the empirical likelihood 
based confidence interval and compare it to that based on normal approximation. We 
simulate i.i.d. samples from the nonparametric functional model using the following 
mean function 

r(x) = J \x'(t)\(l — cos(nt))dt. 
The random covariate curves are simulated from 



X(t) = sm{ut) + {a + 2n)t + b, u ~ Unif(0, 2vr), a, b ~ Unif(0, 1) 



and the noise e ,- is simulated f rom a iV(0, a 2 ) distribution. This example is the same 
as that used in iFerraty et al.l (120071 ) to illustrate bootstrap bandwidth selection. 

We use n = 200 and n = 500 as well as a 2 = 0.5 and a 2 = 2 resulting in a 
combination of four scenarios. We use the quadratic kernel K(s) = (1 — s 2 ), < s < 1 
and the L 2 distance between the first derivatives of the curves as the semi-metric since 



8 



11 


a 2 


EL 


Normal 


Corrected EL 


Corrected Normal 


200 


0.5 


0.912 


0.887 


0.933 


0.901 






(0.85) 


(0.90) 


(0.87) 






2.0 


0.914 


0.879 


0.930 


0.895 






(1.17) 


(1.28) 


(1.15) 




500 


0.5 


0.920 


0.891 


0.943 


0.911 






(0.62) 


(0.72) 


(0.66) 






2.0 


0.921 


0.890 


0.946 


0.905 






(0.91) 


(1.00) 


(0.93) 





Table 1: Simulation results for the constructed 95% confidence interval. The numbers 
shown are the coverage accuracy and the average interval lengths (numbers in the 
brackets) . 



the true regression function directly depends on the first derivative. The bandwidth 
is chosen using cross-validation, taking advantage of the npfda R software publicly 
available online Qhttp : //www . lsp . ups-tlse . f r/ staph/npf da|) . 

For each simulation scenario, we randomly generated 100 testing curves and we 
constructed 95% confidence intervals for them using both empirical likelihood and 
normal approximation, with bias either ignored or corrected based on the approach 
presented in section 12.31 The whole simulation process is repeated 50 times. The 
percentage of times it co vers the true r ( x) as well as the average interval length is 
calculated. As shown by iFerraty et al.l (120071 ). the asymptotic variance of r(x ) is 
a 2 M2/(n<j)(h)M 2 ), with relevant constants Mi and M 2 defined in the Appendix. For 
the normal approximation approach, we need to estimate the variance as well as the 
constants Mi and M 2 . As shown in the Appendix we have Mi = lim^ EK/<f)(h) 
and M 2 = lim^o EK 2 /<f)(h). Thus M 2 /(n0(/i)M^) can be estimated by the sample 
version Y^i Kf/{Y^i Ki) 2 - The noise variance a 2 will be estimated by the mean residual 
a 2 = (Y l -f(X l )) 2 /n. 

The simulation results shown in Table [1] demonstrated the superiority of empirical 
likelihood based intervals. For all cases, the empirical likelihood method produces 
better coverage and shorter intervals compared to the normal approximation based 
method. We also observe that the bias-corrected intervals give improved coverage 
accuracy. 



3.2 Real data 

In this subsection we use two real datasets to illustrate the construction of confidence 
intervals for functional data in nonparametric regression. First, our spectrometries 
dataset contains as covariates 215 spectra of light absorbance as functions of the 
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wavelength, for 215 pieces of finely chopped meat. The dependent variable is the 
percentage of fat in each piece of meat. Besides, we use the protein and moisture 
conten t as two covariates of the l inear part in a partially linear model since a previous 
study (lAneiros- Perez and Vieul . 120061 ) shows these additional covariates give better 
prediction performance. The first 165 samples are used as training data, the rest 50 
samples are used as testing data and 95% confidence intervals are constructed for the 
testing data. The semi-metric used is the L 2 metric on the first derivatives of the 
spectra curves, which gives be st prediction performance on this dataset as shown in 



Aneiros-Perez and Vieul ( 120061 ) . Since the convergence rate of the linear part is faster 



than the nonparametric part, it seems reasonable as a rough approximation to shift 
the confidence interval for r(x) by z T /3 and treat it as a confidence interval for the 
entire regression function z T (3 + r(x). 

Our El Nino time series dataset records the monthly sea surface temperature 
from June, 1950 up to May, 2004 (648 months). We use the first 53 years as training 
and the final year temperature as our testing data. The j-th month temperature in 
a certain year is predicted from a nonparametric regression model using the whole 
previous year observations treated as a curve. Thus the prediction for each month in 
the future is based on a different regr ession model. For this dataset, we use the PCA 
semi-metric ( iFerraty and Vieul . 120061 1 with the first four principal components. 

For both datasets, the constructed confidence intervals for the testing data are 
shown in Figure [1] using empirical likelihood as well as normal approximation, with 
the bias-corrected version. For better visualization, the testing samples are sorted 
according to the estimated responses. Consistent with our simulations, the normal 
intervals are generally longer, with an average length of 2.69 and 0.78 for the two 
datasets respectively, compared to the average length of 2.17 and 0.62 for the empirical 
likelihood intervals. In these two datasets, although we have no way of assessing the 
coverage rate of the constructed intervals, we believe the empirical likelihood intervals 
are better based on our previous simulation results. The mean squared error (MSE) 
for the bias-corrected estimator on the testing data are 3.78 and 4.04 in the two 
datasets, which compares favorably with MSE for the uncorrected estimator, 5.36 
and 4.49 respectively. 



4 Conclusions 

The construction of confidence intervals should accompany any statistical analysis 
where a point estimate is obtained. In this paper, we propose to use empirical likeli- 
hood for such purposes for the nonparametric functional data model. The popularity 
of empirical likelihood derives from its data-dependent shape of the constructed confi- 
dence regions, resulting in typically better finite sample performance. The asymptotic 
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Figure 1: Two-sided 95% Confidence intervals constructed using both empirical like- 
lihood and normal approximation on the testing samples of (a) the spectrometries 
dataset; (b) the El Nino dataset. The intervals constructed using normal approxi- 
mation are shifted slightly to the right for visualization. The solid circles denote the 
observed responses. 
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property of the empirical likelihood ratio is demonstrated and we also propose a bias 
correction method that avoids undersmoothing or direct estimation of the bias. We 
also extend our result to semi-functional partially linear models where the confidence 
interval for the nonparametric part can be constructed. Our asymptotic results are 
presented in the general context of strongly mixing data sequence and thus applicable 
to time series datasets. 

Similar results can be obtained when the empirical likelihood is replaced by other 
distance functions. A particula rly simple alternative is the Euclidean log likelihood 
ratio J2i{ n Pi — I) 2 ( lOwenl . 120011 ) . In this case, one can show the Lagrange multiplier A 
can be eliminated to obtain J2i( n Pi ~ I) 2 = Ki{Yi — /i)) 2 / J2i[(Ki(Yi ~ A 4 ) ~ KY + 
Kfi) 2 }, where KY = J2iKiY/n and K = J2iKi/ n - Thus the Euclidean likelihood 
has some advantage over the empirical likelihood in terms of computations. 



Appendix: Proofs 

We first list some conditions assumed in the theorems: 

(cO) P(LR n (n) = 0) — > 0, and similarly for LR* nl LR n and LR n , for either // = fi 
or [A — r(xo). 

(cl) The kernel K is supported on [0, 1], and its derivative K' exists on [0, 1]. Either 
K is bounded and bounded away from zero on [0, 1], or, K' < and bounded 
way from zero on [0,1], and for 5 > small enough, Jq cj)(h)dh > C5(f>(5) for 
some constant C > 0. 

(c2) The regression function r(x) satisfies the Lipschitz condition: |r(sEi) —r{x2)\ < 
Cd(xi,X2) a - The variance function a 2 (x) is continuous in a neighborhood of Xq. 

(c3) The distribution of the random covariate X and Z are both compactly sup- 
ported, and the third conditional moment for Y satisfies _E(|y| 3 |X = x) < C < 
oo in a neighborhood of xq. 

(c4) For all < s < 1, \im.h^,o <p{hs) / <p{h) — > r(s) for some function r. 

(c5) ip(h)/(j)(h) 2 is bounded. 

(c6) The sequence (Yj, Xj) is a-mixing with mixing coefficients a(-) satisfying ^ 7 [a(0] 1-2 ^ < 
00 for some v > 2 and 7 > 1 — 2/u. 

(c7) There exists a sequence v n satisfying v n — > 00, v n = o((n0(/i)) 1 / 2 ) and (n/0(/i)) 1//2 a(t> n ) — > 
0. 
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(c8) h -> 0,n(f)(h) -> oo, nh 2a (j)(h) = 0(1). 
(c9) The kernel is continuous on (0, oo). 

Condition (cO) is the basic assumption for empirical likelihood. For example, in non- 
parametric regression, it is equivalent to saying that there exists two samples % and 
j such that, Ki > 0, Kj > 0,Yi > fi and Yj < \i. This of course can be guaranteed 
with mild assumptions on the distribution of error. We directly put it down as one 
assumption to avoid verification for different cases. Condition (cl) guarantees that 
EK(d(X,xn)/h) h as th e same asymptotic order as (f>(h) (see Lemma 4.3 and 4.4 in 



Ferraty and Vieul (120061 )). The Lipschitz assumption directly determines the order of 
the bias of the Nadaraya- Watson estimator. The assumption on the support of X 
and Z is for technical reasons to simplify several arguments in the pro ofs, which can 



be rel axed with more careful analysis. Condition (c4) is also stated in iFerraty et al. 



( 120071 ) and is used to get explicit expression for asymptotic bias and variance. Condi- 
tions (c5) and (c6) are the standard conditions used to show that the sequence is only 
weakly dependent so that the asymptotic properties of various statistics is similar to 
the i.i.d. case. The existence of sequence v n in condition (c7) is used for the standard 
big-block small-block argument for weakly dependent data. As is well-known for the 
Nadaraya- Wat son estimator the bias is of order 0(h a ) and the variance is of order 
O((n0(/i)) -1 ). Thus the asymptotically optimal bandwidth satisfies condition (c8) 
which provides the correct trade-off between bias and variance. Although this is not 
required for all of our results, it is assumed throughout for simplicity. Finally, condi- 
tion (c9) will only be used in the proof of of the Wilks's theorem for the bias-corrected 
empirical likelihood, where the continuity of kernel is required to show the bias can 
be correctly eliminated. In particular, since we assume K is supported on [0, 1], this 
implies K(l) = 0. 

We first introduce the following constants: 

M 1 = K(l) - I (sK(s))'r(s)ds, 
Jo 

M 2 = K 2 (1)- [\K 2 y(s)r(s)ds. 
Jo 

These constants are defined in IFerraty et al. I (l2007h . and actually from their calcula- 



tions, we find Mi = lim^o-E^M^) and M 2 = lim h ^ EK 2 /(p(h). 
Proof of Theorem^ (a) The general approach t o show the converg ence in distribution 



of the empirical likelihood ratio is laid out in iHjort et al.l ((2009). In particular, we 
only need to verify their assumptions (A1)-(A3), while assumption (AO) in their paper 
is directly assumed in (cO). Thus our proof will be split into three steps. 
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Step 1. We show 

1= ► iv(0, — — a (a?o)J m distribution. 

Since we deal with dependent data here, the proof of central limit theorem is 
more involved. The main ideas used include big-block small-block trick, bounding 
covariances using mixing coefficients, and finally showing asymptotic normality from 
convergence of character istic functions . We omit the details here since they are almost 
identical to the proof in iMasryl (120051 ) in showing the asymptotic normality of r(xo). 
We only calculate the asymptotic mean and variance of K(Y - /i ) j\[EK which gives 
us the expression in the asymptotic normal distribution above. 

The mean is obviously zero and the variance is calculated as 

Var(K(Y-fi )/VEK) = -^—EK 2 (Y — fi ) 2 = -^—(EK 2 Y 2 — 2fi EK 2 Y + ^EK 2 ). 
We have 

EK 2 Y 2 = E(r 2 (X)K 2 ) + E(a 2 (X)K 2 ) = {a 2 {x ) + r 2 (x ) + o(l))EK 2 , 

by the continuity of r(x) and a 2 (x). Similarly, EK 2 Y = (r(xo) + o(l))EK 2 . Thus 
the variance is 

Var(K(Y - ii )/\ r EK) 
= {(a 2 (x ) + r 2 (x ) + o{l))EK 2 - 2^r{x )EK 2 + fi 2 EK 2 }/EK 
= (a 2 (x ) + (r(x ) - + o{l))EK 2 /EK 
-> a 2 (x )M 2 /M 1 . 

Step 2. We show ^Zek^ ~* v 2 (x )M 2 /M 1 in probability. As calculated in 
Step 1, E[K 2 (Y -fi ) 2 /EK] -> a 2 (x )M 2 /M 1 and the result follows from the ergodic 
theorem. 

Step 3. We show maxi<i< n ^/==g|p^ ~~ > in probability. 
We have by the union bound and Markov inequality that 



VnEK 
using condition (c3). 



P(max \Ki(Yi - /x )| > 5 VnEK) 

i 

< nP{\K{Y - fx )\ 3 > 5 3 {nEK) 3/2 ) 

= o, 1 • 
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Now the conditions (Al)-(A3) in lHjort et al.l (120091 ) are verified with the asymp- 
totic variance in Step 1 same as the converged constant in Step 2, and part (a) in the 
theorem is proved. 

(b) The proof is very similar to (a). In particular, the calculations in Step 2 and 
3 above remain exactly the same with /i replaced by r(xo). For Step 1, we have 



Vn~EK 



J2i KiiYi - /i Q ) Ei - r(x )) 



Vn~EK 



We note that the term /io — r(xo) is same as the bias ter m in estimating r(xn) and 
thus |/io — r(xo)\ = 0(h a ) = o((n0(/i)) -1 / 2 ) by the results in lFerraty and Vieul (120041 ) 
and the assumption nh 2a (p(h) — > 0. The following Lemma [T] immediately implies that 
— Ki ^ Yi r< ^> anc [ Jli K *(Yi vo) k ave same asymptotic distribution. 

VnEK * 



VnEK 



Lemma 1 For any random sequence a^, 1 < i < n, we have — °p(l) if 

maxj | Oi| = o p ((n0(/i))" 1 / 2 ) . 

Proof of Lemma{J\ ^ek* — ^^ek*' ~ = ^p( max i \ a i\) = °p(( n 0(^)) -1 ^ 2 ) since 
J2iKi/(nEK) — > 1 by the ergodic theorem. Combine this with the fact that EK is 
of the same order as <p(h) to get the result. 

Proof of Theorem^ As in Theorem 1, we split the proof into three steps. Step 2 and 
3 still remain the same with KiiYi — fi ) replaced by KiiYi — r(xo) — f(Xi) + r(xo)) 
using the fact that \r(Xi) — r(xo)\ = o p (l) when Ki > 0. We only need to replace 
Step 1 to show that £» K t (Yi -r(x ) -r(X 4 ) + f(x ))/\ / nEK N(0, a 2 (x )M 2 /M 1 ). 

To simplify notation, we set Kij = K(d(Xi,Xj)/h) and Wij = Kij/ J2j Kij. We 
note that even though ^ Wji, we have = Wji(l + o p (l)). Similarly, let Wi = 
Ki/J2jKj = K (d(Xi, xo) / h) / Y^j K(d(Xj , xq) / h) . We have the decomposition 

i 

= E K i( Y i - r(Xi)) + E K t (r(X t ) - r(x ) - f(X t ) + f(x )) 

i i 

= E KiiYi - r(Xi)) - E^(E^' -Y, w i Y i - < X >) + < x o)) 

i i j j 

= E KiiYi - r(Xi)) - E Ki(E v&j - E 

i i j j 

- E Ki(£ ^AXj) - E wjriXj) - r(Xi) + r(x )) 
* j i 
= (I) -(II) -(III). 
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Dealing with (J), similar to the proof of Step 1 in Theorem [T], we have X^-fQO^ — 
r{Xij)/\JnEK converges to N(0, a 2 (xo)M2/Mi) in distribution. And we only need 
to show (II) = o p (y/nEK) and (///) = o p (\/nEK). This is done in the following 
two lemmas for better readability. 

Lemma 2 J2i #i(Ej w ij e j ~ Ej w^i) = o p (VnEK). 

Proof of Lemma\j^ Denote the expression on the left hand side by g n . We bound the 
variance of g n conditional on {Xi, i = 1, . . . , n} as 

To see more clearly the order of each term in the sum, for a fixed j, we set m(x) : = 
K(d(x,x )/h). Then Y,i(K iWij ) - K 3 = Ei[(^(l + o p (l)))m(X 4 )] - m(X J ) = 0p (l) 
due to the continuity of m. 

Thus we have E(g^\{Xi}) = Var(g n \{Xi}) = o p (nEK) which implies g n = 
o p (VnEK). 

Lemma 3 E, ^i(Ej w ij r(X j ) - Ej w j r(X j ) - r(Xi) + r(x )) = o p (s/nEK). 
Proof of Lemma O The left hand side is equal to 

]T{£ K t {\ + o(l)K,(r(X,) - r(X)) - ^(r^-) - r(x ))} 

j i 

Fixing any j and setting m(x) := K(d(Xj,x)/h)(r(Xj)—r(x))/h a . Note m is bounded 
due to the Lipschitz condition (c2) for r. Each term above inside the sum over j is 
rewritten as {E;-?Q(1 + o p (l))m(X;) - m(x )}h a = o p (h a ) = o p {l/VnEK) by the 
same argument as in Lemma [2] as well as condition (c8). Thus the left hand side of 
the expression in the statement of the lemma is o(ynEK). 

Proof of Theorem [3 With the partially linear model 

Y l -Zf(3 = r(X l ) + e i , 

we see that the empirical likelihood ratio in Theorem [3] with true /?o replacing (5 will 
have the desired convergence property by apply Theorem[2]to Yj — Zj '/? instead of Y{. 
Using the boundedness of Zi and the assumption that — /3 || = O p (n~ 1 / 2 ), together 
with Lemma [1], we can still verify the three steps in the proofs of Theorems [1] and [21 
and thus the empirical likelihood ratio defined using either (3q or (3 are asymptotically 
equivalent. 
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